We looked over the daily daily average PM2.5 and created some graphs to go along with the data.
Here is a glimpse at the data in the dataset.
Here is a Bar Plot of the PM 2.5 data.
Besides the outliers, the graph is decently symmetric. There are more and further outliers to the right.
Note that although the current national ambient air quality standard is 12 micrograms per cubic meter, it used to be 15.
Fresno County, Kern County, Kings County, Los Angeles County, Merced County, Riverside County, Stanislaus County, and Tulare County are all counties that exceeds the air quality standard of 15 micrograms per cubic meter. These counties are all in California.
Here are two side-by-side plots to explore the difference in PM2.5 levels between eastern and western U.S.
There are more values close to the median from the east. There are more outliers from the west. The west clumps more towards the lower end while the east clumps towards the higher end.
There are values above the past maximum of 15.
There are more values close to the median from the east. There are more outliers from the west. The west clumps more towards the lower end while the east clumps towards the higher end.
The eastern data has less devation while the western data has more deviation
There is high correlation between longitude and pm25, but low correlation between latitude and pm25 indicated by the size od the pie and the shade of the color.
---
title: "Midterm"
author: "Jamie Zhang"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: sketchy
navbar-bg: "#42033D"
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(tidyverse)
library(plotly)
library(DT)
```
Overview
===
Column {data-width=350}
---
We looked over the daily <span Style="color:#854798">daily average PM2.5</span> and created some graphs to go along with the data.
Here is a glimpse at the data in the dataset.
Column {data-width=550}
---
### Data
```{r}
pm<-read_csv("avgpm25.csv")
datatable(pm[1:500,],rownames = F, colnames = c("PM2.5","Fips","Region","Longitude","Latitude"), options = list(pageLength = 20))
```
Bar Plot
===
Column {data-width=350}
---
Here is a <span Style="color:#854798">Bar Plot</span> of the PM 2.5 data.
### Analysis
Besides the outliers, the graph is decently symmetric. There are more and further outliers to the right.
Column {data-width=550}
---
### Bar Plot
```{r}
ggplot(pm,aes(x=pm$pm25))+
geom_boxplot(fill="#680E4B",color="gray10")+
labs(title = "PM2.5",x="µg/m^3")+
ylim(-0.6,0.6)
```
Air Quality
===
Column {data-width=350}
---
Note that although the current national ambient air quality standard is 12 micrograms per cubic meter, it used to be 15.
### Analysis
Fresno County, Kern County, Kings County, Los Angeles County, Merced County, Riverside County, Stanislaus County, and Tulare County are all counties that exceeds the air quality standard of 15 micrograms per cubic meter. These counties are all in California.
Column {data-width=550}
---
### Data
```{r}
pm %>%
arrange(pm$pm25) %>%
head(8) %>%
datatable(rownames = F,colnames = c("PM2.5","Fips","Region","Longitude","Latitude"), options = list(pageLength = 8))
```
Box/Violin Plots
===
Column {data-width=350}
---
Here are two side-by-side plots to explore the difference in PM2.5 levels between eastern and western U.S.
### Analysis
There are more values close to the median from the east. There are more outliers from the west. The west clumps more towards the lower end while the east clumps towards the higher end.
Column {.tabset data-width=550}
---
### Box Plots
```{r}
ggplot(pm,aes(x=pm$pm25,y=pm$region))+
geom_boxplot(fill="#680E4B",color="gray10")+
labs(y = "Region",x="PM2.5")
```
### Violin Plots
```{r}
ggplot(pm,aes(x=pm$pm25,y=pm$region))+
geom_violin(fill="#680E4B",color="gray10")+
labs(y = "Region",x="PM2.5")
```
Histograms
===
Column {data-width=350}
---
### Analysis
There are values above the past maximum of 15.
There are more values close to the median from the east. There are more outliers from the west. The west clumps more towards the lower end while the east clumps towards the higher end.
Column {.tabset data-width=550}
---
### Histogram with Cutoff
Here is a histogram of the PM2.5 data. The vertical line on the histogram shows the cutoff value.
```{r}
library(ggplot2)
ggplot(pm,aes(x=pm$pm25))+
geom_histogram(binwidth = 1, fill="#680E4B", color="gray10" ) +
geom_vline(xintercept = 15, color = "#854798") +
geom_text(aes(x = 15, y = 10, label = paste("Cutoff:", 15)), color = "#854798", vjust = -0.5, hjust = 0) +
labs(x = "PM2.5", y = "Frequency")
```
### Histogram by Region
Here is a histogram of the PM2.5 data split into two histograms by regions.
```{r}
ggplot(pm,aes(x=pm$pm25))+
geom_histogram(fill="#680E4B",color="gray10")+
labs(x="PM2.5")+
facet_wrap(~region, nrow = 2)
```
Scatterplots
===
Column {data-width=350}
---
### Analysis
The eastern data has less devation while the western data has more deviation
Column {.tabset data-width=550}
---
### Scatterplot1
```{r}
ggplot(pm,aes(x=pm$pm25,y=pm$latitude,color=pm$region))+
geom_point()+
labs(x="PM2.5",y="Latitude",color="Region")
```
### Scatterpot2
```{r}
ggplot(pm,aes(x=pm$pm25,y=pm$latitude))+
geom_point(color="#680E4B")+
labs(x="PM2.5",y="Latitude")+
facet_wrap(~region, ncol = 2)
```
Correlogram
===
Column {data-width=350}
---
### Analysis
There is high correlation between longitude and pm25, but low correlation between latitude and pm25 indicated by the size od the pie and the shade of the color.
Column {data-width=550}
---
### Correlogram
```{r}
library(corrgram)
select(pm,pm25,longitude,latitude)%>%
corrgram( order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt)
```